Sentence retrieval for abstracts of randomized controlled trials
نویسنده
چکیده
BACKGROUND The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies. METHOD Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating Intervention, Participant and Outcome Measures are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category. RESULTS Using CRFs, sentences can be labeled for the four rhetorical roles with F-scores from 0.93-0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for Intervention, Participant and Outcome Measures, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. F-scores of up to 0.83 and 0.84 are obtained for Intervention and Outcome Measure sentences. CONCLUSION Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.
منابع مشابه
Categorization of Sentence Types in Medical Abstracts
This study evaluated the use of machine learning techniques in the classification of sentence type. 7253 structured abstracts and 204 unstructured abstracts of Randomized Controlled Trials from MedLINE were parsed into sentences and each sentence was labeled as one of four types (Introduction, Method, Result, or Conclusion). Support Vector Machine (SVM) and Linear Classifier models were generat...
متن کاملA comparison of handsearching versus MEDLINE searching to identify reports of randomized controlled trials.
This study aims to compare handsearching to a basic MEDLINE search for the identification of reports of randomized trials in specialized health care journals. Twenty-two specialized health care journals, published in the U.K., were handsearched for all reports of controlled trials (as defined by the Cochrane Collaboration). The reports of trials, which were judged to be definitely randomized, w...
متن کاملPubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts
We present PubMed 200k RCT1, a new dataset based on PubMed for sequential sentence classification. The dataset consists of approximately 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract using one of the following classes: background, objective, method, result, or conclusion. The purpose o...
متن کاملImportant considerations in calculating and reporting of sample size in randomized controlled trials
Background: The calculation of the sample size is one of the most important steps in designing a randomized controlled trial. The purpose of this study is drawing the attention of researchers to the importance of calculating and reporting the sample size in randomized controlled trials. Methods: We reviewed related literature and guidelines and discussed some important issues in s...
متن کاملچگونگی گزارش روش شناسی در کارآزمائی های بالینی تصادفی
Introduction: Abstract writing is one of the secondary services for summarizing the content of documents. It represents the major information and is used as an overview of the text. However, abstracts should be written and indexed on the basis of some criteria to provide sufficient and reliable information about the main text. This study aimed to assess the abstracts of Randomized Controlled C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 9 شماره
صفحات -
تاریخ انتشار 2009